home *** CD-ROM | disk | FTP | other *** search
- Path: lr46pstn.lr.tudelft.nl!koen
- From: koen@lr46pstn.lr.tudelft.nl (Koen D'Hondt)
- Newsgroups: comp.lang.c
- Subject: Using regexec for multiple patterns. How???
- Date: 28 Feb 1996 23:44:51 GMT
- Organization: Ripley Software Development
- Sender: koen@dutlhs1.lr.tudelft.nl
- Distribution: all
- Message-ID: <4h2pdj$fui@mo6.rc.tudelft.nl>
- NNTP-Posting-Host: lr46pstn.lr.tudelft.nl
-
- Hi All,
-
- I've been experimenting with regcomp and friends to do pattern-matching in
- texts, but I've run into a few problems trying to search for multiple
- occurances of a search pattern. The regcomp and regexec are the GNU stuff.
- The system I'm using is Linux 1.3.63.
-
- The first problem is that regcomp doesn't seem to support the [:XXXXX:]
- character classes. I tried something like regcomp(regex, "[:digit:]+", 0)
- but using the compiled pattern matches any words containing the characters
- :digit, so digits aren't matched... Using [0-9]+ works.
-
- I've been looking at the regex.c, and the support for character classes
- should be there, regardless of what flags are fed to regcomp.
-
- The second problem is trying to search for multiple occurances of a pattern
- in a string. As I understand from the man-page and regex.info, using regexec
- in the following way: regexec(regex, text, 10, pmatch, 0) would give me up
- to 10 matches, with the positions returned in pmatch[]. pmatch elements not
- used have the rm_so field set to -1 (which is not the case, unused elems are
- set to a silly value like 10^9 or so).
-
- Here's the thing I wrote to get acquinted with regcomp, regexec:
- /*****
- * test_regex.c : regular expression search.
- *****/
-
- #include <stdio.h>
- #include <stdlib.h>
- #include <string.h>
- #include <regex.h>
-
- static char *text = {"
- Pippin looked behind. The number of Ents had grown --or what was happening?.
- Where the dim bare slopes that they had crossed lie, he thought he saw groves
- of trees. But they were moving! Could it be that the trees of Fangorn were
- awake, and the forest was rising, marching over the hills to war? He rubbed
- his eyes wondering if sleep and shadow had deceived him; but the great grey
- shapes moved steadily onward. There was a noise like wind in many branches.
- The Ents were drawing near the crest of the ridge now, and all song had
- ceased. Night fell, and there was silence: nothing was to be heard save a
- faint quiver of the earth beneath the feet of the Ents, and a rustle, the
- shade of a whisper as of many drifting leaves. At last they stood upon
- the summit, and looked down into a dark pit: the great cleft at the end of
- the mountains: Nan Curunir, the Valley of Saruman.
- 'Night lies over Isengard', said Treebeard.
- Quote from The Lord of the Rings, page 508."};
-
- /****
- * compile a regular expression, returning the compiled pattern on success
- * or NULL on failure
- *****/
- regex_t
- *compile_re(char *re_string)
- {
- int reg_err;
- char buf[128];
- regex_t *regex = NULL;
-
- regex = (regex_t*)malloc(sizeof(regex_t));
-
- if((reg_err = regcomp(regex, re_string, REG_EXTENDED))!=0)
- {
- regerror(reg_err, regex, buf, 128);
- fprintf(stderr,"compile_re failed: %s\n", buf);
- free(regex);
- return(NULL);
- }
- return(regex);
- }
-
- /****
- * go do a regex search on the given string with the given pattern
- * Count indicates how many times we should search.
- ****/
- int
- search_re(regex_t *regex, char *search_text, int count)
- {
- regmatch_t *pmatch;
- char *chPtr, buf[128];
- int i, result;
-
- pmatch = (regmatch_t*)malloc(count * sizeof(regmatch_t));
-
- result = regexec(regex, search_text, count, pmatch, 0);
- if(result == REG_NOMATCH)
- {
- printf("No match found\n");
- free(pmatch);
- return(-1);
- }
- /* this causes a SIGSEGV when count > 1 */
- for(i = 0; i < count && pmatch[i].rm_so != -1; i++)
- {
- bzero(buf,128);
- chPtr = &search_text[pmatch[i].rm_so];
- strncpy(buf, chPtr, pmatch[i].rm_eo - pmatch[i].rm_so);
- strcat(buf,"\0");
- printf("%i: start at %i, end at %i (%s)\n",
- i, pmatch[i].rm_so, pmatch[i].rm_eo, buf);
- }
- printf("No of matches found: %i\n",i);
- free(pmatch);
- return(0);
- }
-
- int main(int argc, char **argv)
- {
- regex_t *preg;
-
- printf("regex testing, text to search:\n");
- printf("%s\n", text);
-
- /* simple search, 1 occurance */
- printf("Searching for Saruman\n");
- if((preg = compile_re("Saruman"))!= NULL)
- {
- search_re(preg, text, 1);
- regfree(preg);
- free(preg);
- }
- /* simple search, should find 3 occurances */
- printf("Searching for Ents\n");
- if((preg = compile_re("Ents")) != NULL)
- {
- search_re(preg, text, 10);
- regfree(preg);
- free(preg);
- }
- return(0);
- }
-
- FYI: I've tried it using the regcomp, regexec supplied in libc, and tried
- using a compiled version of regex.c, both give the same (non)result.
-
- I now have a version which calls regexec every time for the number of matches
- I want, supplying an updated search text every time a match has been found,
- but to my opinion it could be done a lot easier if regexec could do it all
- at once.
-
- Greets,
- Koen D'Hondt.
-
- --
- Koen D'Hondt Niels Hilbrink
- koen@dutlhs1.lr.tudelft.nl niels@dutlcc3.lr.tudelft.nl
-
- Ripley Software Development finger niels@dutlcc3.lr.tudelft.nl for more info.
-